Error

LW server reports: not allowed.

This probably means the post has been deleted or moved back to the author's drafts.

johnswentworth Sep 20, 2024, 10:10 PM
2 points
0

We Humans Learn About Our Values
I’d kinda like to wrap this whole section in a thought-bubble, or quote block, or color, or something, to indicate that the entire section is “what it looks like from inside a human’s mind”. So e.g. from inside my mind, it looks like we humans learn about our values. And then outside that bubble, we can ask “are there any actual ‘values’ which we’re in fact learning about”?
- David Lorell Sep 20, 2024, 10:23 PM
  2 points
  0
  Parent
  
  Seems accurate to me. This has been an exercise in the initial step(s) of CCC, which indeed consist of “the phenomenon looks this way to me. It also looks that way to others? Cool. What are we all cottoning on to?”
David Lorell Sep 20, 2024, 10:04 PM
2 points
0

Indeed, our beliefs-about-values can be integrated into the same system as all our other beliefs, allowing for e.g. ordinary factual evidence to become relevant to beliefs about values in some cases.
Super unclear to the uninitiated what this means. (And therefore threateningly confusing to our future selves.)

Maybe: “Indeed, we can plug ‘value’ variables into our epistemic models (like, for instance, our models of what brings about reward signals) and update them as a result of non-value-laden facts about the world.”
David Lorell Sep 20, 2024, 10:01 PM
2 points
0

But clearly the reward signal is not itself our values.
Ahhhh

Maybe: “But presumably the reward signal does not plug directly into the action-decision system.”?

Or: “But intuitively we do not value reward for its own sake.”?
johnswentworth Sep 20, 2024, 10:01 PM
2 points
0

in a hand-wavy reinforcement-learning-esque sense
language
johnswentworth Sep 20, 2024, 10:00 PM
2 points
0

an agent could aim to pursue any values regardless of what the world outside it looks like; “how the external world is” does not tell us “how the external world should be”.
Extremely delicate wording dancing around the “should be” vs “should be according to me” distinction, with embeddedness allowing facts to update “should be according to me” without crossing the is-ought gap… in principle.
- David Lorell Sep 20, 2024, 10:20 PM
  2 points
  0
  Parent
  
  Wait. I thought that was crossing the is-ought gap. As I think of it, the is ought gap refers to the apparent type-clash and unclear evidential entanglement between facts-about-the-world and values-an-agent-assigns-to-facts-about-the-world. And also as I think of it, “should be” always is short hand for “should be according to me” though possibly means some kind of aggregated thing but also ground out in subjective shoulds.
  
  So “how the external world is” does not tell us “how the external world should be” …. except in so far as the external world has become causally/logically entangled with a particular agent’s ‘true values’. (Punting on what are an agent’s “true values” are as opposed to the much easier “motivating values” or possibly “estimated true values.” But for the purposes of this comment, its sufficient to assume that they are dependent on some readable property (or logical consequence of readable properties) of the agent itself.)
  - johnswentworth Sep 20, 2024, 10:24 PM
    2 points
    0
    Parent
    
    facts-about-the-world
    Needs jargon
    values-an-agent-assigns-to-facts-about-the-world
    also needs jargon
    logical consequence of readable properties
    ...
    - David Lorell Sep 20, 2024, 10:29 PM
      2 points
      0
      Parent
      
      wiggitywiggitywact := fact about the world which requires a typical human to cross a large inferential gap.
    - David Lorell Sep 20, 2024, 10:27 PM
      2 points
      0
      Parent
      
      wact := fact about the world
      mact := fact about the mind
      aact := fact about the agent more generally
      
      vwact := value assigned by some agent to a fact about the world
      - johnswentworth Sep 20, 2024, 10:30 PM
        2 points
        0
        Parent
        
        Spitballing:
        “local fact” vs “global fact” (to evoke local/global variables)
        “local fact” vs “interoperable fact”
        “internal fact” vs “interoperable fact”
        “fact valence” for the value stuff
David Lorell Sep 20, 2024, 9:59 PM
2 points
0

It does seem like humans have some kind of physiological “reward”, in a hand-wavy reinforcement-learning-esque sense, which seems to at least partially drive the subjective valuation of things.
Hrm… If this compresses down to, “Humans are clearly compelled at least in part by what ‘feels good’.” then I think it’s fine. If not, then this is an awkward sentence and we should discuss.
David Lorell Sep 20, 2024, 9:57 PM
2 points
0

an agent could aim to pursue any values regardless of what the world outside it looks like;
Without knowing what values are, it’s unclear that an agent could aim to pursue any of them. The implicit model here is that there is something like a value function in DP which gets passed into the action-decider along with the world model and that drives the agent. But I think we’re saying something more general than that.
David Lorell Sep 20, 2024, 9:54 PM
2 points
0

but the fact that it makes sense to us to talk about our beliefs
Better terminology for the phenomenon of “making sense” in the above way?
johnswentworth Sep 20, 2024, 9:53 PM
2 points
0

a “map” of our values
johnswentworth Sep 20, 2024, 9:52 PM
2 points
0

guess at their own values
Every time the wording of a sentence implies that there are, in fact, some values which someone has or estimates, I picture the adorable not-so-sneaky elephant.
David Lorell Sep 20, 2024, 9:51 PM
2 points
0

“learn” in the sense that their behavior adapts to their environment.
I want a new word for this. “Learn” vs “Adapt” maybe. Learn means updating of symbolic references (maps) while Adapt means something like responding to stimuli in a systematic way.
johnswentworth Sep 20, 2024, 9:51 PM
2 points
0

know our own values
… there’s a whole fucking elephant swept under that rug. I can see its trunk peaking out. It’s adorable how sneaky that elephant thinks it’s being.
- David Lorell Sep 20, 2024, 10:12 PM
  2 points
  0
  Parent
  
  We have at least one jury rigged idea! Conceptually. Kind of.
johnswentworth Sep 20, 2024, 9:48 PM
2 points
0

The internal heuristics or behaviors “learned” by an adaptive system are not necessarily “about” any particular external thing, and don’t necessarily represent any particular external thing
I give up.
- David Lorell Sep 20, 2024, 10:11 PM
  2 points
  0
  Parent
  
  Yeeeahhh.… But maybe it’s just awkwardly worded rather than being deeply confused. Like: “The learned algorithms which an adaptive system implements may not necessarily accept, output, or even internally use data(structures) which have any relationship at all to some external environment.” “Also what the hell is ‘reference’.”
johnswentworth Sep 20, 2024, 9:48 PM
2 points
0

Adaptive systems “learn” things, but they don’t necessarily “learn about” things; they don’t necessarily have an internal map of the external territory.
So much screaming
johnswentworth Sep 20, 2024, 9:48 PM
2 points
0

symbolic representation of the environment, and they update those symbols over time to (hopefully) better match the environment
more scream
- David Lorell Sep 20, 2024, 10:08 PM
  2 points
  0
  Parent
  
  Seconded. I have extensional ideas about “symbolic representations” and how they differ from.… non-representations.… but I would not trust this understanding with much weight.
johnswentworth Sep 20, 2024, 9:47 PM
2 points
0

behavior adapts to their environment
scream
- David Lorell Sep 20, 2024, 10:06 PM
  2 points
  0
  Parent
  
  Seconded. Comments above.